AITopics | neural layer

Collaborating Authors

neural layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Psychology of Artificial Intelligence: Epistemological Markers of the Cognitive Analysis of Neural Networks

Pichat, Michael

arXiv.org Artificial IntelligenceJul-4-2024

What is the "nature" of the cognitive processes and contents of an artificial neural network? In other words, how does an artificial intelligence fundamentally "think," and in what form does its knowledge reside? The psychology of artificial intelligence, as predicted by Asimov (1950), aims to study this AI probing and explainability-sensitive matter. This study requires a neuronal level of cognitive granularity, so as not to be limited solely to the secondary macro-cognitive results (such as cognitive and cultural biases) of synthetic neural cognition. A prerequisite for examining the latter is to clarify some epistemological milestones regarding the cognitive status we can attribute to its phenomenology.

dimension, neural network, vector space, (16 more...)

arXiv.org Artificial Intelligence

2407.09563

Country:

North America > United States > New York (0.05)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads

Sharma, Harsh, Narang, Gaurav, Doppa, Janardhan Rao, Ogras, Umit, Pande, Partha Pratim

arXiv.org Artificial IntelligenceMar-27-2024

Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. However, as the complexity of Deep convolutional neural networks (DNNs) grows, we need to design a manycore architecture with multiple ReRAM-based processing elements (PEs) on a single chip. Existing PIM-based architectures mostly focus on computation while ignoring the role of communication. ReRAM-based tiled manycore architectures often involve many Processing Elements (PEs), which need to be interconnected via an efficient on-chip communication infrastructure. Simply allocating more resources (ReRAMs) to speed up only computation is ineffective if the communication infrastructure cannot keep up with it. In this paper, we highlight the design principles of a dataflow-aware PIM-enabled manycore platform tailor-made for various types of DL workloads. We consider the design challenges with both 2.5D interposer- and 3D integration-enabled architectures.

architecture, chiplet, fabrication cost, (14 more...)

arXiv.org Artificial Intelligence

2403.19073

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Washington > Whitman County > Pullman (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees

Zhang, Ling, Tabas, Daniel, Zhang, Baosen

arXiv.org Artificial IntelligenceApr-3-2023

In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions that only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.01409

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance

Zhang, Guoqiang

arXiv.org Artificial IntelligenceFeb-28-2023

A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): down-scaling, (2): epsilon-embedding, and (3): down-translating. The resulting algorithm is referred to as SET-Adam, where SET is a brief notation of the three operations. The down-scaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding all-one subvectors. Extensive experimental results show that SET-Adam outperforms eight adaptive optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the eight adaptive methods when training WGAN-GP models for image generation tasks. Furthermore, SET-Adam produces higher validation accuracies than Adam and AdaBelief for training ResNet18 over ImageNet.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.01029

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range

Zhang, Guoqiang, Niwa, Kenta, Kleijn, W. Bastiaan

arXiv.org Artificial IntelligenceJan-24-2023

We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we extend AdaBelief by further suppressing the range of the adaptive stepsizes. To achieve the above goal, we perform mutual layerwise vector projections between the gradient g_t and its first momentum m_t before using them to estimate the second momentum. The new optimization method is referred to as Aida. Thirdly, extensive experimental results show that Aida outperforms nine optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the nine methods when training WGAN-GP models for image generation tasks. Furthermore, Aida produces higher validation accuracies than AdaBelief for training ResNet18 over ImageNet. Code is available at this URL

adabelief, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2203.13273

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Tutorial on Neural Networks and Gradient-free Training

Rozario, Turibius, Trivedi, Arjun, Goel, Ankit

arXiv.org Artificial IntelligenceNov-26-2022

This paper presents a compact, matrix-based representation of neural networks in a self-contained tutorial fashion. Specifically, we develop neural networks as a composition of several vector-valued functions. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are mathematical nonlinear functions constructed by composing several vector-valued functions. Using basic results from linear algebra, we represent a neural network as an alternating sequence of linear maps and scalar nonlinear functions, also known as activation functions. The training of neural networks requires the minimization of a cost function, which in turn requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a function composed of a sequence of linear maps and nonlinear functions. In addition to the analytical gradient computation, we consider two gradient-free training methods and compare the three training methods in terms of convergence rate and prediction accuracy.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2211.17217

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > Maryland > Baltimore County (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.86)

Industry: Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Probing Statistical Representations For End-To-End ASR

Ollerenshaw, Anna, Jalal, Md Asif, Hain, Thomas

arXiv.org Artificial IntelligenceNov-3-2022

End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition. In this domain there is little research to analyse internal representation dependencies and their relationship to modelling approaches. This paper investigates cross-domain language model dependencies within transformer architectures using SVCCA and uses these insights to exploit modelling approaches. It was found that specific neural representations within the transformer layers exhibit correlated behaviour which impacts recognition performance. Altogether, this work provides analysis of the modelling approaches affecting contextual dependencies and ASR performance, and can be used to create or adapt better performing End-to-End ASR models and also for downstream tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.01993

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)

Add feedback

Deep Learning Using TensorFlow Keras - Analytics India Magazine

#artificialintelligenceMay-7-2021, 02:30:06 GMT

Deep Learning is a subset of Machine learning. It was developed to have an architecture and functionality similar to that of a human brain. The human brain is composed of neural networks that connect billions of neurons. Similarly, a deep learning architecture comprises artificial neural networks that connect a number of mathematical units called neurons. Deep Learning is capable of modeling complex problems that, in some cases, exceed human performance!

activation function, neural network, neuron, (13 more...)

#artificialintelligence

Country:

Asia > India (0.40)
North America > United States > California (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration

Ao, Sheng, Hu, Qingyong, Yang, Bo, Markham, Andrew, Guo, Yulan

arXiv.org Artificial IntelligenceNov-24-2020

Extracting robust and general 3D local features is key to downstream tasks such as point cloud registration and reconstruction. Existing learning-based local descriptors are either sensitive to rotation transformations, or rely on classical handcrafted features which are neither general nor representative. In this paper, we introduce a new, yet conceptually simple, neural architecture, termed SpinNet, to extract local features which are rotationally invariant whilst sufficiently informative to enable accurate registration. A Spatial Point Transformer is first introduced to map the input local surface into a carefully designed cylindrical space, enabling end-to-end optimization with SO(2) equivariant representation. A Neural Feature Extractor which leverages the powerful point-based and 3D cylindrical convolutional neural layers is then utilized to derive a compact and representative descriptor for matching. Extensive experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques by a large margin. More critically, it has the best generalization ability across unseen scenarios with different sensor modalities. The code is available at https://github.com/QingyongHu/SpinNet.

dataset, descriptor, spinnet, (14 more...)

arXiv.org Artificial Intelligence

2011.12149

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Approximated Orthonormal Normalisation in Training Neural Networks

Zhang, Guoqiang, Niwa, Kenta, Kleijn, W. B.

arXiv.org Machine LearningNov-21-2019

Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to improve the generalisation capacity of a DNN model. Considering a weight matrix W from a particular neural layer in the model, our objective is to design a function h(W) such that its row vectors are approximately orthogonal to each other while allowing the DNN model to fit the training data sufficiently accurate. By doing so, it would avoid co-adaptation among neurons of the same layer to be able to improve network-generalisation capacity. Specifically, at each iteration, we first approximate (WW^T)^(-1/2) using its Taylor expansion before multiplying the matrix W. After that, the matrix product is then normalised by applying the spectral normalisation (SN) technique to obtain h(W). Conceptually speaking, AON is designed to turn orthonormal regularisation into orthonormal normalisation to avoid manual balancing the original and penalty functions. Experimental results show that AON yields promising validation performance compared to orthonormal regularisation.

aon, normalisation, orthonormal regularisation, (13 more...)

arXiv.org Machine Learning

1911.09445

Country:

Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.70)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback